Skip to content

[P/D] add acc test script of hpu pd disagg #1394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: habana_main
Choose a base branch
from

Conversation

zhenwei-intel
Copy link

@zhenwei-intel zhenwei-intel commented Jun 10, 2025

This PR adds disaggregated (disagg) vs non-disaggregated (baseline) accuracy tests. The changes include the addition of configuration files, a shell script to automate the test process, and a Python script to validate the outputs.

Signed-off-by: zhenwei <zhenweiliu@habana.ai>
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new pipeline to run and compare non-disaggregated (baseline) and disaggregated accuracy tests against a vLLM-based model.

  • Introduces a Python script (test_disagg_accuracy.py) to generate outputs, save baseline results, and verify exact matches in disagg mode.
  • Adds a Bash script (run_hpu_disagg_accuracy_test.sh) to orchestrate etcd, Mooncake, vLLM servers, and execute both test modes end-to-end.
  • Includes mooncake.json for MooncakeStoreConnector configuration.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pd_xpyd/test_disagg_accuracy.py New client script: sends prompts, writes baseline JSON, and asserts disaggregated outputs match
pd_xpyd/run_hpu_disagg_accuracy_test.sh Automation script: starts services, runs baseline/disagg tests, and cleans up
pd_xpyd/mooncake.json Configuration for etcd/Mooncake key-value store connector
Comments suppressed due to low confidence (3)

pd_xpyd/test_disagg_accuracy.py:148

  • In disagg mode you are reading the baseline file, so this should report a read error instead of "Error writing to file".
print(f"Error writing to file: {e}")

pd_xpyd/test_disagg_accuracy.py:69

  • The docstring says "two optional string arguments" but --service_url and --model_name are required. Update the docstring to reflect the actual behavior and all arguments.
    """
    This script demonstrates how to accept two optional string arguments

pd_xpyd/run_hpu_disagg_accuracy_test.sh:23

  • [nitpick] This variable uses lowercase naming while others are uppercase. Consider renaming to MAX_NUM_BATCHED_TOKENS for consistency with the script’s style.
max_num_batched_tokens=2048

@jikunshang
Copy link

@zhenwei-intel zhenwei-intel marked this pull request as draft June 16, 2025 06:52
Signed-off-by: zhenwei <zhenweiliu@habana.ai>
@zhenwei-intel
Copy link
Author

do you think we can reuse this file? https://github.com/HabanaAI/vllm-fork/blob/habana_main/tests/kv_transfer/test_disagg.py

Not very reusable. Here, I run the baseline and PD with one click, then compare whether the outputs are consistent.

@zhenwei-intel zhenwei-intel marked this pull request as ready for review June 17, 2025 03:03
@zhenwei-intel zhenwei-intel dismissed michalkuligowski’s stale review June 20, 2025 05:32

Currently, this script is being used as a demo for users or QA.

@zhenwei-intel zhenwei-intel enabled auto-merge (squash) June 20, 2025 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants